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In many circumstances it is appropriate ..o use the 
school as the unit of analysis. The variables measured on students 
must be aggregated to form a mean for each school. However, the means 
derived from the students sampled in a school will tend to fluctuate 
around the true mean for the school in a way determined by the 
within-school correlations among student variables rather than by the 
between-school correlations. A model is presented which circumvents 
this problem by obtaining replicate measures for each variable. The 
model permits estimation of the true between-schools covariance 
matrix and measurement error variances. An example employing real 
data is presented. (Author) 
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In )Pany e4¥^cuiiai::ance s_^ is apprGprir?te to use, j±^^sehtr5T 
as rhe unit of v^n^-'Iysis as, for example, in a model corrparing 
treatments affecting entire schools or in a model relating 
expenditures for instruction to pupil achievement o When the 
school is the appropriate unit of analysis, the variables 
measured on students must be aggregated* to form a mean (or 
other measure of central tendency) for each school. When 
students are sampled in order to generate such a mean, there 
may be some difficulties in using the aggregated values. 

For a. particular pair of variables X and Y, the sample of 
students is supposed to generate a pair of sample means v/hich 
will be close to the true values for the school. Clearly, 
there will be some fluctuations from sample to sample; - 
fluctuations which will depend upon the correlation between 
these variables within the school being sampled. As the 
between schools correlation of these variables is desired in 
the analysis, the data are contaminated by the within school 
correlation v/hen a sampling procedure is used. 

In the following example data from each of two s^.mples of 
boys obtained at each of 39 schools are analysed using a 
method developed by Joreslcog. The analysis shows one way in 
which the problem of aggregation may be handled. 
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For each of the Srrinples of boys at each school six variables 
were reasured and the •uear. values of rl-e observe* tioiis were 
comput:od. 'r^"^us each school v/es represen'-ed in the r>nelysis 
by six pairs of observations one i^.c-i^n for each variable froip 
each sample* 

&ccauae-the values of one replicate measure of all six 

variables for a school are determined on the same subset of 
students, the correlations between variables within the replicate 
measures were systematically higher than the correlations between 
variables across replicate measurest This implies the presence 
of a ••sampling factor*^ analogous to the ••method factor" in 
classical test construction (Caiapbell and Fiske, 1959), 

In order to extract the covariance natriK of the latent 
or "true" variables of interest, the method of analysis of 
covariance structures (Jo^eskog, 1970) vras applied to these 
data. . 7,vo models niay be enterVained to ^'ccounl': for the observed 

covari^^'nce matrix of 12 variables. The first model posits no 
"sampling factor" v/hile the second model takes account of this 
feature of the data explicitly. 

Both models may.be expressed as parameteri2:ations of the 
matrix decomposition of the covariance matrix, 2 , given below: 



For the model with no sanplinjn fr'ctor the p^»r?rr.eteriza tion 
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f - DtagonaK ^2 ^3 ^4 ^5 ^6 ^1 ^2 ^3 ^4 ^5 ^6^ 

The ones in the matrix A indicate that the solution is 
restricted to the value 1 ,0 in that location. The reason for 
this restriction is provided in the discussion of the alternative 
model whose parameterization isi 
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Where a one appears in the above matrices, the solution is 
restricted to the value 1.0 for that location. Zeroes represent 
locations restricted to the value 0,0 in the solution* The 
matrix A has six columns for the trait factors in vhich the 
loadings are restricted to 1#0 • This follows from the sampling 
8truct\ire for the replicate meastires* Each measure consists of 
the same items, merely different persons from the same school. 
It seems appropriate to restrict the loading of each measure on 
the underlying true variable to be 1.0 • The next two columns of 
A» represent sampling factors. In order to make the parameters 
associated with the sampling factors identifiable, the leading 
parameter must be restricted to have a weight of 1.0 on each 
sampling factor and the variance«>covariance matrix of sampling 
factors must be the identity matrix. In addition, the elements 
of the sampling factors which are to be estimated are restcicted 
to be < equal for both replicate measures due to the sampling 
nature of the replicate measures. 
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The aatrix $ represents the varlance-co variance matrix of 
the latent (or true) meastires. It consists of the submatrices 6, 
a 6 X 6 covariance matrix of the true variables of interest; 0, 
the 2x6 matrix of sample factor by trait factor covariances 
which are restricted to be null? and I, the 2x2 identity 
matrix mentioned above. The estimate of 0 will be used in the 
causal flow analysis to follow. 



Iho manrxA '4' IS also restricted to represent the sampling 
nature of the replicate measures. The measurement error variances 
for each replicate measure of a variable are restricted to be 
•qtial. Using Joreskog's program for the analysis of covariance 
struct'jures (1970), the following parameter estimates and 
standard errors were found for the model with sampling factors: 
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-0.041 0.06 

0.165 0.11 
-1.251 1.61 

0.981 0.25 



>j « 1.03 i 0.19 
^2 » 0.91 t 0.11 
^3 « 0.321 0.04 
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« 0.651' 0.26 
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The obtained X of 53 • 8 on 46 degrees of freedom corresponds 
to a probability of 0.20 wh^,ch indicates a reasonably good 
fit of the model to the data. The model with no sampling 
factor would not converge properly to a solution. The conclusion 
which this author drav/s from this phenomenon is that the 
sampling factor is quite important to the fit of the model. 
This further implies that there is a substantial problem 

in the aggregation of s.fcud&n^— da^n r n fbe level of^chool. 

In the example the only coefficient of loading on the sampling 
factor which appears to be sisnificcnt is for th3 sixth 
variable. PresuiTiedly, this raeans that rhs firs': ^nd sixth 
v-sriables are most responsible for the existance of the 
aggragation difficulty. Substantively, this makes sense as well 
- for the first variable is "father's education (in years)** and 
the sixth variable is "obtained test score". The correlation 
of these two variables is well documented. 

One caution may be put forx^ard in the recommendation of 
this method. A similar analysis was performed for ^irls, and 
while the general result (lack of convergence for the no sample 
factor model; reasonably good fit for the alternative model) 
was the sane, the estimated matrix 0 (the estimated true 
covariance natri:: of the variables of interest which would be 
used in subsequent analyses) proved to have a nGgat3.ve latent 
root. No good reason can be put forward ^t this ti,v^e to 
explain this ill-conditioned solution. 
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